Overview

Dataset statistics

Number of variables20
Number of observations64068
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.0 MiB
Average record size in memory490.6 B

Variable types

Numeric12
DateTime1
Categorical7

Warnings

Company has constant value "Pink Cab" Constant
df_index is highly correlated with Transaction ID and 1 other fieldsHigh correlation
Transaction ID is highly correlated with df_index and 1 other fieldsHigh correlation
KM Travelled is highly correlated with Price Charged and 1 other fieldsHigh correlation
Price Charged is highly correlated with KM Travelled and 1 other fieldsHigh correlation
Cost of Trip is highly correlated with KM Travelled and 1 other fieldsHigh correlation
Year is highly correlated with df_index and 1 other fieldsHigh correlation
Company is highly correlated with Year and 5 other fieldsHigh correlation
Year is highly correlated with CompanyHigh correlation
City is highly correlated with CompanyHigh correlation
Holiday is highly correlated with CompanyHigh correlation
Day of Week is highly correlated with CompanyHigh correlation
Payment_Mode is highly correlated with CompanyHigh correlation
Gender is highly correlated with CompanyHigh correlation
df_index has unique values Unique
Transaction ID has unique values Unique

Reproduction

Analysis started2021-02-27 19:45:57.261043
Analysis finished2021-02-27 19:46:37.043638
Duration39.78 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct64068
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean183900.9572
Minimum8
Maximum359390
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:37.211191image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile22632.7
Q191735.75
median186113.5
Q3275792
95-th percentile342569.3
Maximum359390
Range359382
Interquartile range (IQR)184056.25

Descriptive statistics

Standard deviation103392.1525
Coefficient of variation (CV)0.5622165
Kurtosis-1.204769336
Mean183900.9572
Median Absolute Deviation (MAD)92085
Skewness-0.02088681297
Sum1.178216652 × 1010
Variance1.06899372 × 1010
MonotocityStrictly increasing
2021-02-27T23:16:37.406668image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2641911
 
< 0.1%
1604751
 
< 0.1%
2096511
 
< 0.1%
806261
 
< 0.1%
2076001
 
< 0.1%
1195351
 
< 0.1%
2465091
 
< 0.1%
2587951
 
< 0.1%
3163481
 
< 0.1%
2567441
 
< 0.1%
Other values (64058)64058
> 99.9%
ValueCountFrequency (%)
81
< 0.1%
91
< 0.1%
141
< 0.1%
191
< 0.1%
301
< 0.1%
451
< 0.1%
481
< 0.1%
541
< 0.1%
551
< 0.1%
651
< 0.1%
ValueCountFrequency (%)
3593901
< 0.1%
3593891
< 0.1%
3593881
< 0.1%
3593861
< 0.1%
3593851
< 0.1%
3593841
< 0.1%
3593691
< 0.1%
3593681
< 0.1%
3593641
< 0.1%
3593621
< 0.1%

Transaction ID
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct64068
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10224673.33
Minimum10000011
Maximum10437212
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:37.662984image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum10000011
5-th percentile10027801.35
Q110112718.75
median10226083.5
Q310338269.25
95-th percentile10416460.65
Maximum10437212
Range437201
Interquartile range (IQR)225550.5

Descriptive statistics

Standard deviation126258.8704
Coefficient of variation (CV)0.01234845029
Kurtosis-1.209315365
Mean10224673.33
Median Absolute Deviation (MAD)113136
Skewness-0.02101934193
Sum6.550743711 × 1011
Variance1.594130235 × 1010
MonotocityNot monotonic
2021-02-27T23:16:37.863446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
101059621
 
< 0.1%
100879761
 
< 0.1%
100516221
 
< 0.1%
103016291
 
< 0.1%
101246351
 
< 0.1%
100043631
 
< 0.1%
104160161
 
< 0.1%
101205081
 
< 0.1%
101246541
 
< 0.1%
104075491
 
< 0.1%
Other values (64058)64058
> 99.9%
ValueCountFrequency (%)
100000111
< 0.1%
100000121
< 0.1%
100000131
< 0.1%
100000141
< 0.1%
100000151
< 0.1%
100000161
< 0.1%
100000171
< 0.1%
100000181
< 0.1%
100000191
< 0.1%
100000201
< 0.1%
ValueCountFrequency (%)
104372121
< 0.1%
104371981
< 0.1%
104371961
< 0.1%
104371941
< 0.1%
104371931
< 0.1%
104371911
< 0.1%
104371901
< 0.1%
104371891
< 0.1%
104371881
< 0.1%
104371871
< 0.1%
Distinct1095
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size500.7 KiB
Minimum2016-01-02 00:00:00
Maximum2018-12-31 00:00:00
2021-02-27T23:16:38.085868image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:38.283324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Company
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.0 MiB
Pink Cab
64068 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters512544
Distinct characters8
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPink Cab
2nd rowPink Cab
3rd rowPink Cab
4th rowPink Cab
5th rowPink Cab
ValueCountFrequency (%)
Pink Cab64068
100.0%
2021-02-27T23:16:38.642365image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:16:38.750075image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
pink64068
50.0%
cab64068
50.0%

Most occurring characters

ValueCountFrequency (%)
P64068
12.5%
i64068
12.5%
n64068
12.5%
k64068
12.5%
64068
12.5%
C64068
12.5%
a64068
12.5%
b64068
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter320340
62.5%
Uppercase Letter128136
 
25.0%
Space Separator64068
 
12.5%

Most frequent character per category

ValueCountFrequency (%)
i64068
20.0%
n64068
20.0%
k64068
20.0%
a64068
20.0%
b64068
20.0%
ValueCountFrequency (%)
P64068
50.0%
C64068
50.0%
ValueCountFrequency (%)
64068
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin448476
87.5%
Common64068
 
12.5%

Most frequent character per script

ValueCountFrequency (%)
P64068
14.3%
i64068
14.3%
n64068
14.3%
k64068
14.3%
C64068
14.3%
a64068
14.3%
b64068
14.3%
ValueCountFrequency (%)
64068
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII512544
100.0%

Most frequent character per block

ValueCountFrequency (%)
P64068
12.5%
i64068
12.5%
n64068
12.5%
k64068
12.5%
64068
12.5%
C64068
12.5%
a64068
12.5%
b64068
12.5%

City
Categorical

HIGH CORRELATION

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
LOS ANGELES CA
19865 
NEW YORK NY
13967 
CHICAGO IL
9361 
BOSTON MA
5186 
MIAMI FL
2002 
Other values (10)
13687 

Length

Max length14
Median length11
Mean length11.49781482
Min length8

Characters and Unicode

Total characters736642
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPHOENIX AZ
2nd rowPHOENIX AZ
3rd rowNEW YORK NY
4th rowLOS ANGELES CA
5th rowCHICAGO IL
ValueCountFrequency (%)
LOS ANGELES CA19865
31.0%
NEW YORK NY13967
21.8%
CHICAGO IL9361
14.6%
BOSTON MA5186
 
8.1%
MIAMI FL2002
 
3.1%
AUSTIN TX1868
 
2.9%
NASHVILLE TN1841
 
2.9%
ATLANTA GA1762
 
2.8%
ORANGE COUNTY1513
 
2.4%
DENVER CO1394
 
2.2%
Other values (5)5309
 
8.3%
2021-02-27T23:16:39.108117image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca22248
13.6%
angeles19865
12.2%
los19865
12.2%
new13967
8.6%
york13967
8.6%
ny13967
8.6%
chicago9361
 
5.7%
il9361
 
5.7%
boston5186
 
3.2%
ma5186
 
3.2%
Other values (20)30044
18.4%

Most occurring characters

ValueCountFrequency (%)
98949
13.4%
A78955
10.7%
N67964
9.2%
E63086
8.6%
O61232
8.3%
L59297
8.0%
S53070
 
7.2%
C45211
 
6.1%
G34232
 
4.6%
Y29447
 
4.0%
Other values (15)145199
19.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter637693
86.6%
Space Separator98949
 
13.4%

Most frequent character per category

ValueCountFrequency (%)
A78955
12.4%
N67964
10.7%
E63086
9.9%
O61232
9.6%
L59297
9.3%
S53070
8.3%
C45211
 
7.1%
G34232
 
5.4%
Y29447
 
4.6%
I29030
 
4.6%
Other values (14)116169
18.2%
ValueCountFrequency (%)
98949
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin637693
86.6%
Common98949
 
13.4%

Most frequent character per script

ValueCountFrequency (%)
A78955
12.4%
N67964
10.7%
E63086
9.9%
O61232
9.6%
L59297
9.3%
S53070
8.3%
C45211
 
7.1%
G34232
 
5.4%
Y29447
 
4.6%
I29030
 
4.6%
Other values (14)116169
18.2%
ValueCountFrequency (%)
98949
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII736642
100.0%

Most frequent character per block

ValueCountFrequency (%)
98949
13.4%
A78955
10.7%
N67964
9.2%
E63086
8.6%
O61232
8.3%
L59297
8.0%
S53070
 
7.2%
C45211
 
6.1%
G34232
 
4.6%
Y29447
 
4.0%
Other values (15)145199
19.7%

KM Travelled
Real number (ℝ≥0)

HIGH CORRELATION

Distinct874
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.55923987
Minimum1.9
Maximum48
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:39.294619image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1.9
5-th percentile3.5805
Q112
median22.44
Q332.86
95-th percentile42
Maximum48
Range46.1
Interquartile range (IQR)20.86

Descriptive statistics

Standard deviation12.21873606
Coefficient of variation (CV)0.5416288906
Kurtosis-1.126642246
Mean22.55923987
Median Absolute Deviation (MAD)10.44
Skewness0.05570736882
Sum1445325.38
Variance149.297511
MonotocityNot monotonic
2021-02-27T23:16:39.502064image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33.6271
 
0.4%
39.6202
 
0.3%
22.8195
 
0.3%
24195
 
0.3%
35.7190
 
0.3%
37.44185
 
0.3%
16.8183
 
0.3%
28.08180
 
0.3%
27152
 
0.2%
42.18150
 
0.2%
Other values (864)62165
97.0%
ValueCountFrequency (%)
1.950
0.1%
1.9263
0.1%
1.9457
0.1%
1.9675
0.1%
1.9868
0.1%
263
0.1%
2.0256
0.1%
2.0461
0.1%
2.0668
0.1%
2.0876
0.1%
ValueCountFrequency (%)
4862
0.1%
47.650
 
0.1%
47.257
 
0.1%
46.8143
0.2%
46.4165
0.1%
46.461
0.1%
46.0259
0.1%
4657
 
0.1%
45.6373
0.1%
45.6135
0.2%

Price Charged
Real number (ℝ≥0)

HIGH CORRELATION

Distinct40501
Distinct (%)63.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean311.0193337
Minimum15.6
Maximum1623.48
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:39.736435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum15.6
5-th percentile48.8435
Q1159.8325
median297.6
Q3440.75
95-th percentile625.7195
Maximum1623.48
Range1607.88
Interquartile range (IQR)280.9175

Descriptive statistics

Standard deviation183.0481327
Coefficient of variation (CV)0.5885426172
Kurtosis-0.08453434762
Mean311.0193337
Median Absolute Deviation (MAD)140.4
Skewness0.4909935408
Sum19926386.67
Variance33506.61887
MonotocityNot monotonic
2021-02-27T23:16:39.934923image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
180.358
 
< 0.1%
56.17
 
< 0.1%
204.217
 
< 0.1%
459.247
 
< 0.1%
178.497
 
< 0.1%
367.687
 
< 0.1%
169.277
 
< 0.1%
393.517
 
< 0.1%
195.297
 
< 0.1%
248.417
 
< 0.1%
Other values (40491)63997
99.9%
ValueCountFrequency (%)
15.61
< 0.1%
15.751
< 0.1%
16.381
< 0.1%
16.531
< 0.1%
16.761
< 0.1%
17.031
< 0.1%
17.111
< 0.1%
17.211
< 0.1%
17.271
< 0.1%
17.461
< 0.1%
ValueCountFrequency (%)
1623.481
< 0.1%
1517.151
< 0.1%
1495.61
< 0.1%
1377.731
< 0.1%
1368.661
< 0.1%
1359.591
< 0.1%
1339.311
< 0.1%
1332.981
< 0.1%
1319.521
< 0.1%
1235.961
< 0.1%

Cost of Trip
Real number (ℝ≥0)

HIGH CORRELATION

Distinct9659
Distinct (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean248.1118607
Minimum19.19
Maximum576
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:40.157312image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum19.19
5-th percentile40.341
Q1132
median246.42
Q3360.126
95-th percentile463.904
Maximum576
Range556.81
Interquartile range (IQR)228.126

Descriptive statistics

Standard deviation135.2614952
Coefficient of variation (CV)0.5451633582
Kurtosis-1.079802523
Mean248.1118607
Median Absolute Deviation (MAD)114.024
Skewness0.08607470702
Sum15896030.69
Variance18295.67208
MonotocityNot monotonic
2021-02-27T23:16:40.341817image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
428.442
 
0.1%
393.1240
 
0.1%
280.836
 
0.1%
342.7235
 
0.1%
205.234
 
0.1%
181.4434
 
0.1%
383.0433
 
0.1%
33633
 
0.1%
356.432
 
< 0.1%
399.8432
 
< 0.1%
Other values (9649)63717
99.5%
ValueCountFrequency (%)
19.194
< 0.1%
19.23
< 0.1%
19.382
< 0.1%
19.3921
 
< 0.1%
19.43
< 0.1%
19.573
< 0.1%
19.5841
 
< 0.1%
19.5943
< 0.1%
19.62
< 0.1%
19.762
< 0.1%
ValueCountFrequency (%)
5763
 
< 0.1%
571.24
 
< 0.1%
566.442
 
< 0.1%
566.46
< 0.1%
561.685
< 0.1%
561.610
< 0.1%
556.962
 
< 0.1%
556.9210
< 0.1%
556.89
< 0.1%
552.2795
< 0.1%

Customer ID
Real number (ℝ≥0)

Distinct22955
Distinct (%)35.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15501.70488
Minimum1
Maximum60000
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:40.565219image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile706
Q13664.75
median7311
Q322056.25
95-th percentile58167.65
Maximum60000
Range59999
Interquartile range (IQR)18391.5

Descriptive statistics

Standard deviation18264.4499
Coefficient of variation (CV)1.178222012
Kurtosis0.6252787027
Mean15501.70488
Median Absolute Deviation (MAD)4505
Skewness1.433872799
Sum993163228
Variance333590130.3
MonotocityNot monotonic
2021-02-27T23:16:40.756706image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
812018
 
< 0.1%
859517
 
< 0.1%
792717
 
< 0.1%
615917
 
< 0.1%
734016
 
< 0.1%
847416
 
< 0.1%
891516
 
< 0.1%
798815
 
< 0.1%
776415
 
< 0.1%
872115
 
< 0.1%
Other values (22945)63906
99.7%
ValueCountFrequency (%)
14
< 0.1%
24
< 0.1%
36
< 0.1%
41
 
< 0.1%
58
< 0.1%
65
< 0.1%
72
 
< 0.1%
86
< 0.1%
95
< 0.1%
103
 
< 0.1%
ValueCountFrequency (%)
600004
< 0.1%
599992
< 0.1%
599983
< 0.1%
599972
< 0.1%
599952
< 0.1%
599943
< 0.1%
599931
 
< 0.1%
599923
< 0.1%
599912
< 0.1%
599902
< 0.1%

Payment_Mode
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.7 MiB
Card
38290 
Cash
25778 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters256272
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCash
2nd rowCard
3rd rowCard
4th rowCard
5th rowCard
ValueCountFrequency (%)
Card38290
59.8%
Cash25778
40.2%
2021-02-27T23:16:41.134697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:16:41.245408image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
card38290
59.8%
cash25778
40.2%

Most occurring characters

ValueCountFrequency (%)
C64068
25.0%
a64068
25.0%
r38290
14.9%
d38290
14.9%
s25778
10.1%
h25778
10.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter192204
75.0%
Uppercase Letter64068
 
25.0%

Most frequent character per category

ValueCountFrequency (%)
a64068
33.3%
r38290
19.9%
d38290
19.9%
s25778
13.4%
h25778
13.4%
ValueCountFrequency (%)
C64068
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin256272
100.0%

Most frequent character per script

ValueCountFrequency (%)
C64068
25.0%
a64068
25.0%
r38290
14.9%
d38290
14.9%
s25778
10.1%
h25778
10.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII256272
100.0%

Most frequent character per block

ValueCountFrequency (%)
C64068
25.0%
a64068
25.0%
r38290
14.9%
d38290
14.9%
s25778
10.1%
h25778
10.1%

Gender
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.8 MiB
Male
36490 
Female
27578 

Length

Max length6
Median length4
Mean length4.860897796
Min length4

Characters and Unicode

Total characters311428
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowMale
5th rowMale
ValueCountFrequency (%)
Male36490
57.0%
Female27578
43.0%
2021-02-27T23:16:41.541610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:16:41.682233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
male36490
57.0%
female27578
43.0%

Most occurring characters

ValueCountFrequency (%)
e91646
29.4%
a64068
20.6%
l64068
20.6%
M36490
 
11.7%
F27578
 
8.9%
m27578
 
8.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter247360
79.4%
Uppercase Letter64068
 
20.6%

Most frequent character per category

ValueCountFrequency (%)
e91646
37.0%
a64068
25.9%
l64068
25.9%
m27578
 
11.1%
ValueCountFrequency (%)
M36490
57.0%
F27578
43.0%

Most occurring scripts

ValueCountFrequency (%)
Latin311428
100.0%

Most frequent character per script

ValueCountFrequency (%)
e91646
29.4%
a64068
20.6%
l64068
20.6%
M36490
 
11.7%
F27578
 
8.9%
m27578
 
8.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII311428
100.0%

Most frequent character per block

ValueCountFrequency (%)
e91646
29.4%
a64068
20.6%
l64068
20.6%
M36490
 
11.7%
F27578
 
8.9%
m27578
 
8.9%

Age
Real number (ℝ≥0)

Distinct48
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.40116751
Minimum18
Maximum65
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:41.817868image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile19
Q125
median33
Q342
95-th percentile61
Maximum65
Range47
Interquartile range (IQR)17

Descriptive statistics

Standard deviation12.67668283
Coefficient of variation (CV)0.3580865752
Kurtosis-0.4778097077
Mean35.40116751
Median Absolute Deviation (MAD)8
Skewness0.6826095716
Sum2268082
Variance160.6982876
MonotocityNot monotonic
2021-02-27T23:16:42.006365image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
232237
 
3.5%
322210
 
3.4%
262210
 
3.4%
192195
 
3.4%
202126
 
3.3%
252113
 
3.3%
392104
 
3.3%
402076
 
3.2%
222069
 
3.2%
342065
 
3.2%
Other values (38)42663
66.6%
ValueCountFrequency (%)
181924
3.0%
192195
3.4%
202126
3.3%
211996
3.1%
222069
3.2%
232237
3.5%
241954
3.0%
252113
3.3%
262210
3.4%
272050
3.2%
ValueCountFrequency (%)
65632
1.0%
64701
1.1%
63718
1.1%
62674
1.1%
61794
1.2%
60672
1.0%
59717
1.1%
58755
1.2%
57682
1.1%
56633
1.0%

Income (USD/Month)
Real number (ℝ≥0)

Distinct15644
Distinct (%)24.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15082.44144
Minimum2001
Maximum35000
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:42.198849image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2001
5-th percentile3241.35
Q18387
median14761.5
Q321079.25
95-th percentile29704
Maximum35000
Range32999
Interquartile range (IQR)12692.25

Descriptive statistics

Standard deviation7996.215513
Coefficient of variation (CV)0.5301671845
Kurtosis-0.6750578341
Mean15082.44144
Median Absolute Deviation (MAD)6338.5
Skewness0.3007725391
Sum966301858
Variance63939462.52
MonotocityNot monotonic
2021-02-27T23:16:42.691534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1551533
 
0.1%
1581828
 
< 0.1%
553228
 
< 0.1%
1817827
 
< 0.1%
609926
 
< 0.1%
1680925
 
< 0.1%
2088425
 
< 0.1%
1430025
 
< 0.1%
735525
 
< 0.1%
1696525
 
< 0.1%
Other values (15634)63801
99.6%
ValueCountFrequency (%)
20011
 
< 0.1%
20022
 
< 0.1%
200711
< 0.1%
20091
 
< 0.1%
20127
< 0.1%
20151
 
< 0.1%
20192
 
< 0.1%
20204
 
< 0.1%
20213
 
< 0.1%
20223
 
< 0.1%
ValueCountFrequency (%)
350001
 
< 0.1%
349952
 
< 0.1%
349897
< 0.1%
349852
 
< 0.1%
3498414
< 0.1%
349831
 
< 0.1%
349791
 
< 0.1%
349731
 
< 0.1%
349722
 
< 0.1%
349676
< 0.1%

Population
Real number (ℝ≥0)

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2833514.834
Minimum248968
Maximum8405837
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:42.888010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum248968
5-th percentile248968
Q1943999
median1595037
Q31955130
95-th percentile8405837
Maximum8405837
Range8156869
Interquartile range (IQR)1011131

Descriptive statistics

Standard deviation2985291.418
Coefficient of variation (CV)1.053564775
Kurtosis-0.2407254656
Mean2833514.834
Median Absolute Deviation (MAD)564852
Skewness1.259678092
Sum1.815376284 × 1011
Variance8.911964847 × 1012
MonotocityNot monotonic
2021-02-27T23:16:43.040637image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
159503719865
31.0%
840583713967
21.8%
19551309361
14.6%
2489685186
 
8.1%
13391552002
 
3.1%
6983711868
 
2.9%
3272251841
 
2.9%
8148851762
 
2.8%
10301851513
 
2.4%
7542331394
 
2.2%
Other values (5)5309
 
8.3%
ValueCountFrequency (%)
2489685186
8.1%
3272251841
 
2.9%
542085682
 
1.1%
5457761334
 
2.1%
6983711868
 
2.9%
7542331394
 
2.2%
8148851762
 
2.8%
9429081380
 
2.2%
943999864
 
1.3%
9593071049
 
1.6%
ValueCountFrequency (%)
840583713967
21.8%
19551309361
14.6%
159503719865
31.0%
13391552002
 
3.1%
10301851513
 
2.4%
9593071049
 
1.6%
943999864
 
1.3%
9429081380
 
2.2%
8148851762
 
2.8%
7542331394
 
2.2%

Users
Real number (ℝ≥0)

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean145470.1403
Minimum3643
Maximum302149
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:43.205160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum3643
5-th percentile9270
Q180021
median144132
Q3164468
95-th percentile302149
Maximum302149
Range298506
Interquartile range (IQR)84447

Descriptive statistics

Standard deviation98934.80855
Coefficient of variation (CV)0.6801038918
Kurtosis-0.8910507885
Mean145470.1403
Median Absolute Deviation (MAD)64111
Skewness0.2995650954
Sum9319980948
Variance9788096343
MonotocityNot monotonic
2021-02-27T23:16:43.351768image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
14413219865
31.0%
30214913967
21.8%
1644689361
14.6%
800215186
 
8.1%
176752002
 
3.1%
149781868
 
2.9%
92701841
 
2.9%
247011762
 
2.8%
129941513
 
2.4%
124211394
 
2.2%
Other values (5)5309
 
8.3%
ValueCountFrequency (%)
3643682
 
1.1%
6133864
1.3%
70441334
2.1%
92701841
2.9%
124211394
2.2%
129941513
2.4%
149781868
2.9%
176752002
3.1%
221571380
2.2%
247011762
2.8%
ValueCountFrequency (%)
30214913967
21.8%
1644689361
14.6%
14413219865
31.0%
800215186
 
8.1%
699951049
 
1.6%
247011762
 
2.8%
221571380
 
2.2%
176752002
 
3.1%
149781868
 
2.9%
129941513
 
2.4%

Holiday
Categorical

HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.6 MiB
-
62825 
Christmas Day
 
314
Thanksgiving Day
 
146
Memorial Day
 
120
Independence Day
 
113
Other values (6)
 
550

Length

Max length37
Median length1
Mean length1.291892989
Min length1

Characters and Unicode

Total characters82769
Distinct characters40
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
-62825
98.1%
Christmas Day314
 
0.5%
Thanksgiving Day146
 
0.2%
Memorial Day120
 
0.2%
Independence Day113
 
0.2%
Veterans Day105
 
0.2%
New Year Day103
 
0.2%
Presidents Day (Washingtons Birthday)100
 
0.2%
Martin Luther King Jr. Day100
 
0.2%
Labor Day74
 
0.1%
2021-02-27T23:16:43.741725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
62825
95.3%
day1243
 
1.9%
christmas314
 
0.5%
thanksgiving146
 
0.2%
memorial120
 
0.2%
independence113
 
0.2%
veterans105
 
0.2%
new103
 
0.2%
year103
 
0.2%
jr100
 
0.2%
Other values (8)742
 
1.1%

Most occurring characters

ValueCountFrequency (%)
-62825
75.9%
a2405
 
2.9%
1846
 
2.2%
s1347
 
1.6%
y1343
 
1.6%
e1288
 
1.6%
D1243
 
1.5%
n1236
 
1.5%
i1226
 
1.5%
r1216
 
1.5%
Other values (30)6794
 
8.2%

Most occurring categories

ValueCountFrequency (%)
Dash Punctuation62825
75.9%
Lowercase Letter14709
 
17.8%
Uppercase Letter3089
 
3.7%
Space Separator1846
 
2.2%
Open Punctuation100
 
0.1%
Close Punctuation100
 
0.1%
Other Punctuation100
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
a2405
16.4%
s1347
9.2%
y1343
9.1%
e1288
8.8%
n1236
8.4%
i1226
8.3%
r1216
8.3%
t919
 
6.2%
h760
 
5.2%
m502
 
3.4%
Other values (11)2467
16.8%
ValueCountFrequency (%)
D1243
40.2%
C382
 
12.4%
M220
 
7.1%
L174
 
5.6%
T146
 
4.7%
I113
 
3.7%
V105
 
3.4%
N103
 
3.3%
Y103
 
3.3%
P100
 
3.2%
Other values (4)400
 
12.9%
ValueCountFrequency (%)
-62825
100.0%
ValueCountFrequency (%)
1846
100.0%
ValueCountFrequency (%)
(100
100.0%
ValueCountFrequency (%)
)100
100.0%
ValueCountFrequency (%)
.100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common64971
78.5%
Latin17798
 
21.5%

Most frequent character per script

ValueCountFrequency (%)
a2405
13.5%
s1347
 
7.6%
y1343
 
7.5%
e1288
 
7.2%
D1243
 
7.0%
n1236
 
6.9%
i1226
 
6.9%
r1216
 
6.8%
t919
 
5.2%
h760
 
4.3%
Other values (25)4815
27.1%
ValueCountFrequency (%)
-62825
96.7%
1846
 
2.8%
(100
 
0.2%
)100
 
0.2%
.100
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII82769
100.0%

Most frequent character per block

ValueCountFrequency (%)
-62825
75.9%
a2405
 
2.9%
1846
 
2.2%
s1347
 
1.6%
y1343
 
1.6%
e1288
 
1.6%
D1243
 
1.5%
n1236
 
1.5%
i1226
 
1.5%
r1216
 
1.5%
Other values (30)6794
 
8.2%

Profit
Real number (ℝ)

Distinct55618
Distinct (%)86.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62.90747295
Minimum-220.06
Maximum1119.48
Zeros1
Zeros (%)< 0.1%
Memory size500.7 KiB
2021-02-27T23:16:43.953158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-220.06
5-th percentile-20.7233
Q110.3175
median40.676
Q394.55575
95-th percentile220.143
Maximum1119.48
Range1339.54
Interquartile range (IQR)84.23825

Descriptive statistics

Standard deviation80.22762043
Coefficient of variation (CV)1.275327345
Kurtosis7.273451088
Mean62.90747295
Median Absolute Deviation (MAD)35.988
Skewness1.913260551
Sum4030355.977
Variance6436.47108
MonotocityNot monotonic
2021-02-27T23:16:44.161606image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.199
 
< 0.1%
5.568
 
< 0.1%
8.347
 
< 0.1%
34.067
 
< 0.1%
18.567
 
< 0.1%
-0.227
 
< 0.1%
0.567
 
< 0.1%
13.317
 
< 0.1%
32.317
 
< 0.1%
21.437
 
< 0.1%
Other values (55608)63995
99.9%
ValueCountFrequency (%)
-220.061
< 0.1%
-198.6981
< 0.1%
-168.9851
< 0.1%
-164.041
< 0.1%
-160.5361
< 0.1%
-153.251
< 0.1%
-150.381
< 0.1%
-148.5861
< 0.1%
-147.4771
< 0.1%
-144.681
< 0.1%
ValueCountFrequency (%)
1119.481
< 0.1%
1056.111
< 0.1%
1039.081
< 0.1%
982.591
< 0.1%
971.171
< 0.1%
907.921
< 0.1%
900.8041
< 0.1%
868.711
< 0.1%
867.041
< 0.1%
827.541
< 0.1%

Year
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.8 MiB
2017.0
22946 
2018.0
22135 
2016.0
18987 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters384408
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016.0
2nd row2016.0
3rd row2016.0
4th row2016.0
5th row2016.0
ValueCountFrequency (%)
2017.022946
35.8%
2018.022135
34.5%
2016.018987
29.6%
2021-02-27T23:16:44.578485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:16:44.689189image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
2017.022946
35.8%
2018.022135
34.5%
2016.018987
29.6%

Most occurring characters

ValueCountFrequency (%)
0128136
33.3%
264068
16.7%
164068
16.7%
.64068
16.7%
722946
 
6.0%
822135
 
5.8%
618987
 
4.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number320340
83.3%
Other Punctuation64068
 
16.7%

Most frequent character per category

ValueCountFrequency (%)
0128136
40.0%
264068
20.0%
164068
20.0%
722946
 
7.2%
822135
 
6.9%
618987
 
5.9%
ValueCountFrequency (%)
.64068
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common384408
100.0%

Most frequent character per script

ValueCountFrequency (%)
0128136
33.3%
264068
16.7%
164068
16.7%
.64068
16.7%
722946
 
6.0%
822135
 
5.8%
618987
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII384408
100.0%

Most frequent character per block

ValueCountFrequency (%)
0128136
33.3%
264068
16.7%
164068
16.7%
.64068
16.7%
722946
 
6.0%
822135
 
5.8%
618987
 
4.9%

Month
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.8820472
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size500.7 KiB
2021-02-27T23:16:44.817846image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q16
median9
Q311
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.33981261
Coefficient of variation (CV)0.4237240054
Kurtosis-0.7716730064
Mean7.8820472
Median Absolute Deviation (MAD)2
Skewness-0.5901321009
Sum504987
Variance11.15434827
MonotocityNot monotonic
2021-02-27T23:16:44.960463image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
129041
14.1%
118556
13.4%
108205
12.8%
97246
11.3%
85895
9.2%
74821
7.5%
64339
6.8%
13800
5.9%
53667
5.7%
33087
 
4.8%
Other values (2)5411
8.4%
ValueCountFrequency (%)
13800
5.9%
22433
 
3.8%
33087
 
4.8%
42978
 
4.6%
53667
5.7%
64339
6.8%
74821
7.5%
85895
9.2%
97246
11.3%
108205
12.8%
ValueCountFrequency (%)
129041
14.1%
118556
13.4%
108205
12.8%
97246
11.3%
85895
9.2%
74821
7.5%
64339
6.8%
53667
5.7%
42978
 
4.6%
33087
 
4.8%

Day of Week
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.9 MiB
Friday
14580 
Saturday
13738 
Sunday
12491 
Thursday
7237 
Monday
5359 
Other values (2)
10663 

Length

Max length9
Median length6
Mean length6.987560092
Min length6

Characters and Unicode

Total characters447679
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSaturday
2nd rowSaturday
3rd rowSaturday
4th rowSaturday
5th rowSaturday
ValueCountFrequency (%)
Friday14580
22.8%
Saturday13738
21.4%
Sunday12491
19.5%
Thursday7237
11.3%
Monday5359
 
8.4%
Tuesday5334
 
8.3%
Wednesday5329
 
8.3%
2021-02-27T23:16:45.305543image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-27T23:16:45.434196image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
friday14580
22.8%
saturday13738
21.4%
sunday12491
19.5%
thursday7237
11.3%
monday5359
 
8.4%
tuesday5334
 
8.3%
wednesday5329
 
8.3%

Most occurring characters

ValueCountFrequency (%)
a77806
17.4%
d69397
15.5%
y64068
14.3%
u38800
8.7%
r35555
7.9%
S26229
 
5.9%
n23179
 
5.2%
s17900
 
4.0%
e15992
 
3.6%
F14580
 
3.3%
Other values (7)64173
14.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter383611
85.7%
Uppercase Letter64068
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
a77806
20.3%
d69397
18.1%
y64068
16.7%
u38800
10.1%
r35555
9.3%
n23179
 
6.0%
s17900
 
4.7%
e15992
 
4.2%
i14580
 
3.8%
t13738
 
3.6%
Other values (2)12596
 
3.3%
ValueCountFrequency (%)
S26229
40.9%
F14580
22.8%
T12571
19.6%
M5359
 
8.4%
W5329
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
Latin447679
100.0%

Most frequent character per script

ValueCountFrequency (%)
a77806
17.4%
d69397
15.5%
y64068
14.3%
u38800
8.7%
r35555
7.9%
S26229
 
5.9%
n23179
 
5.2%
s17900
 
4.0%
e15992
 
3.6%
F14580
 
3.3%
Other values (7)64173
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII447679
100.0%

Most frequent character per block

ValueCountFrequency (%)
a77806
17.4%
d69397
15.5%
y64068
14.3%
u38800
8.7%
r35555
7.9%
S26229
 
5.9%
n23179
 
5.2%
s17900
 
4.0%
e15992
 
3.6%
F14580
 
3.3%
Other values (7)64173
14.3%

Interactions

2021-02-27T23:16:06.530252image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:06.774600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:06.997002image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:07.244341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:07.472730image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:07.701121image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:07.917540image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:08.146927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:08.395264image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:08.710419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:08.932825image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:09.147251image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:09.365667image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:09.575114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:09.794522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:10.006953image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:10.218387image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:10.418852image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:10.628290image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:10.844721image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:11.058140image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:11.256610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:11.451089image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:11.666514image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:11.861029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:12.084397image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:12.288849image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:12.496296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:12.684793image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:13.051807image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:13.281197image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:13.491633image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:13.695089image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:13.892559image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:14.124938image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:14.344350image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:14.567755image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:14.792155image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:15.024534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:15.248931image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:15.479317image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:15.728649image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:15.962068image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:16.184430image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:16.397858image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:16.616308image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:16.832702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:17.042137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:17.279500image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:17.495952image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:17.703368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:17.924775image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:18.150173image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:18.362603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:18.584011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:18.786470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:19.016856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:19.227292image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:19.433741image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:19.657142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:19.870572image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:20.074026image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:20.290450image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:20.523825image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:20.741242image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:20.957662image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:21.163114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:21.365571image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:21.559054image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:21.748549image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:21.964970image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:22.161444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:22.363920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:22.566362image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:22.771810image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:22.974269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:23.163763image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:23.349266image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:23.579670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:23.794076image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:24.007507image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:24.236892image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:24.450322image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:24.669736image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:24.874189image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:25.103576image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:25.328973image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:25.545397image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:25.747853image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:25.982224image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:26.203632image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:26.425039image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:26.667393image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:26.897777image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:27.127162image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:27.338598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:27.570974image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:27.801358image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:28.032742image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:28.463588image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:28.688985image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:28.899421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:29.102879image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:29.324285image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:29.543700image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:29.759125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:29.959585image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:30.176007image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:30.398412image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:30.609847image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:30.810311image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:31.029724image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:31.233179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:31.433644image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:31.657079image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:31.881470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:32.100859image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:32.295343image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:32.512757image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:32.734167image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:32.949590image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:33.146065image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:33.352512image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:33.557962image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:33.749449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:33.967865image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:34.169327image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:34.373804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:34.568259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:34.777700image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:34.994121image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-27T23:16:35.223506image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-27T23:16:45.665580image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-27T23:16:46.015643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-27T23:16:46.367701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-27T23:16:46.731726image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-02-27T23:16:47.115700image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-27T23:16:35.765058image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-27T23:16:36.548961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexTransaction IDDate of TravelCompanyCityKM TravelledPrice ChargedCost of TripCustomer IDPayment_ModeGenderAgeIncome (USD/Month)PopulationUsersHolidayProfitYearMonthDay of Week
0810000163.02016-01-02Pink CabPHOENIX AZ4.4471.5748.84022557.0CashMale38.08808.0943999.06133.0-22.7302016.01.0Saturday
1910000164.02016-01-02Pink CabPHOENIX AZ8.55114.1589.77522469.0CardMale37.04378.0943999.06133.0-24.3752016.01.0Saturday
21410000149.02016-01-02Pink CabNEW YORK NY32.64498.60349.248533.0CardMale52.015974.08405837.0302149.0-149.3522016.01.0Saturday
31910000092.02016-01-02Pink CabLOS ANGELES CA37.76851.25438.0168927.0CardMale19.017197.01595037.0144132.0-413.2342016.01.0Saturday
43010000041.02016-01-02Pink CabCHICAGO IL35.02598.43406.2324289.0CardMale19.028719.01955130.0164468.0-192.1982016.01.0Saturday
54510000070.02016-01-02Pink CabDENVER CO7.0261.3082.83630718.0CashMale52.020255.0754233.012421.0--21.5362016.01.0Saturday
64810000171.02016-01-02Pink CabSAN DIEGO CA14.28269.15147.08420687.0CashMale39.08926.0959307.069995.0-122.0662016.01.0Saturday
75410000201.02016-01-02Pink CabSAN DIEGO CA31.68623.77370.65618490.0CardMale24.010573.0959307.069995.0-253.1142016.01.0Saturday
85510000145.02016-01-02Pink CabNEW YORK NY2.1037.1821.420502.0CashMale28.015285.08405837.0302149.0-15.7602016.01.0Saturday
96510000067.02016-01-02Pink CabDALLAS TX33.32308.58386.51225247.0CashMale26.024178.0942908.022157.0--77.9322016.01.0Saturday

Last rows

df_indexTransaction IDDate of TravelCompanyCityKM TravelledPrice ChargedCost of TripCustomer IDPayment_ModeGenderAgeIncome (USD/Month)PopulationUsersHolidayProfitYearMonthDay of Week
6405835936210433204.02018-12-31Pink CabCHICAGO IL8.82106.33101.4304162.0CashFemale53.022813.01955130.0164468.0-4.9002018.012.0Monday
6405935936410433547.02018-12-31Pink CabNEW YORK NY27.81417.41333.7201779.0CardMale18.015039.08405837.0302149.0-83.6902018.012.0Monday
6406035936810436908.02018-12-31Pink CabLOS ANGELES CA41.80553.77422.1806564.0CashMale19.09101.01595037.0144132.0-131.5902018.012.0Monday
6406135936910433309.02018-12-31Pink CabLOS ANGELES CA10.70128.00119.8408175.0CardMale24.012571.01595037.0144132.0-8.1602018.012.0Monday
6406235938410436745.02018-12-31Pink CabCHICAGO IL23.98307.86270.9744948.0CardMale65.02361.01955130.0164468.0-36.8862018.012.0Monday
6406335938510433590.02018-12-31Pink CabORANGE COUNTY6.3681.8667.41617591.0CardMale30.024699.01030185.012994.0-14.4442018.012.0Monday
6406435938610433494.02018-12-31Pink CabNEW YORK NY37.12600.00408.3201677.0CardMale57.012975.08405837.0302149.0-191.6802018.012.0Monday
6406535938810433435.02018-12-31Pink CabMIAMI FL2.3029.5323.9209774.0CashFemale33.014322.01339155.017675.0-5.6102018.012.0Monday
6406635938910436696.02018-12-31Pink CabBOSTON MA27.55377.85330.60060000.0CashFemale27.020303.0248968.080021.0-47.2502018.012.0Monday
6406735939010433418.02018-12-31Pink CabLOS ANGELES CA2.3429.2125.0387650.0CardFemale32.017629.01595037.0144132.0-4.1722018.012.0Monday